hypotheses
H1 -dimensions on which things organize tensions: top down vs. bottom up regional vs. local economic infrastructure vs. environment H2 -org type H3 -psychological distance H4 -involvement
INTRODUCTION
- ecology of games
- network management
- case of sea level rise
BACKGROUND
- sea level rise as general problem
- sea level rise in case of Bay Area
- regional governance efforts
RATIONALE
- governance as a multi-actor situation without formal hierarchy, instead largely horizontal approaches to steering and decision-making (e..g, Klijn et a. 2010).
- theory about network management (klijn/edelenbos, provan/kenis), EG framework (Lubell, Berardo), perhaps a little ACF
- preferred management strategies and policy tools as a function of the portfolio of issues someone is concerned about
- research questions:
METHODS AND MATERIALS
Data
- survey design and implementation
- response statistics (sample, response rate, etc.)
- basic survey descriptives
- summary table of actor train responses (experience, org type, etc.)
- three panel plot bar plot of frequency of choices for each of concerns, barriers, policies
Figure 1 shows the percentage of respondents who selected each item.
Figure 1: Frequency of response choices in sample
Model
Survey respondents identified up to three (each) from a range of issues, barriers, and strategies. Thus, each respondent is linked to between 0 and 9 system components. We are interested both in: (1) how actors in the governance network assort into different coalitions or preference groups; and (2) how different issues, barriers, and strategies assort into different “policy portfolios”. To analyze both simultaneously, we use a mixture of latent trait analyzers (MLTA) (Gollini 2021) model. The MLTA model works like a combination of latent class analysis (LCA) and latent trait analysis (LTA). LCA is a clustering approach that identifies unobserved, underlying groups of respondents based upon observed responses. LTA is essentially a factor analysis for binary (or categorical) data–the goal being to represent respondents’ choices for the 38 concept options on a reduced set of underlying dimensions. In other words, LCA groups respondents and LTA groups responses.
An MLTA model does both–clustering of respondents and factoring of responses–at the same time. Observations are assumed to come from groups of respondents. Response choices (policies/concerns/barriers) are then assumed to be dependent upon group membership and a group-specific D-dimensional continuous latent trait variable. In other words, response variables are modeled as conditional on latent classes and latent traits.
Responses are represented as a binary incidence matrix X~~nm, were N is the number of respondents and M is the # of response items. The number of classes and groups are preset prior to model fitting. We thus test the fit of models for range of group numbers (1 to 5, where G = 1 implies no subgroups) and trait dimensions (0 to 4, wherein a 0 dimension MLTA is identical to an LCA).
Model Fit
| BIC scores by MLTA specification | |||
| D = 0 | D = 1 | D = 2 | |
|---|---|---|---|
| G = 1 (fixed slope = FALSE) | 23729 | 23779 | 23884 |
| G = 1 (fixed slope = TRUE) | 23777 | 23883 | |
| G = 2 (fixed slope = FALSE) | 23745 | 23967 | 24284 |
| G = 2 (fixed slope = TRUE) | 23847 | 23981 | |
| G = 3 (fixed slope = FALSE) | 23837 | 24234 | 24793 |
| G = 3 (fixed slope = TRUE) | 23940 | 24085 | |
| *fixed slope refers to whether slope is constant across all groups or group-specific | |||
Table @ref(tab:mlta_results) shows Bayesian information criterion (BIC) goodness-of-fit scores for MLTAs fit to different combinations of group and dimension numbers and with slopes either fixed or varied across all subgroups.1. The simplest, most restrictive model–no subgroups, no underlying trait dimension–is the most parsimonious. However, given the prevalance of several common responses shown in figure @ref(fig:figure_choice_percentages), it is unsurprising that the single-cluster model performs fairly well. These broadly shared concerns and preferences, such as recognition that sea level rise poses problems for transportation and stormwater and belief that a SLR plan is needed, are associated with a more diverse array of secondary concerns and policy responses that can be teased out with a multidimensional model. Moreover, overall goodness-of-fit scores are fairly similar across specifications, particularly for fixed slope models where slope parameters are assumed to be constant across groups. Because we are interested in underlying differences among network subgroups, we perform a further series of factor analysis and cluster analysis methods to identify suitable values of D (underlying trait dimensions) and G (number of subgroups) for subsequent analysis.
Identifying underlying trait dimensions
As described above, an LTA model is essentially a factor analysis–thus, we can use typical factor analysis tools for identifying an appropriate number of dimensions. Figure @ref(fig:figure_factor_analysis) plots Eigen values by factor number along with a series of non-graphical tests meant to identify an optimal number of factors. These tests are not all in agreement, ranging from a recommendation of 2 factors (the statistical “elbow” of the curve) to 17 (the number of factors with an Eigenvalue greater than 1). For parsimony, we select the lowest recommended value, D = 2, and fit an MLTA model with two underlying trait dimensions.
Identifying subgroups
We take a similar approach to identifying an appropriate number of groups. Figure @ref(fig:figure_k_means) hierarchically presents a series of k-means clustering results fit to different values of k. The top level shows a single cluster model (i.e., no subgroups), and the very bottom layer shows groups for k = 10. The arrows in figure @ref(fig:figure_k_means) show how respondents change groupings as the number of clusters changes. At high k-values, we observe that the clusters are unstable–respondents who were grouped into separate clusters at lower k-values are now mixed up into totally new groupings. In this regard, k = 3 appears to be a good cluster value–at k = 4, clusters become less stable, while k = 2 appears to mask a distinction between two underlying subgroups.
Given the results presented above, we fit a final MLTA model with G = 3 and D = 2, keeping item-response slopes fixed across subgroups. Using this model, figure @ref(fig:figure_group_probability_map) shows the predicted probability of group membership. The total probability must sum to one. Most respondents have a very high probability of being in a single group, with a just a limited amount showing a more ambiguous prediction. Taking the highest probability for each respondent (across all three groups), the median maximum probability is 0.99, and the minimum is 0.42.
Just as respondents are clustered within groups, concepts are linked to underlying trait dimensions. We can plot the strength of these linkages by plotting the slopes for the logistic response functions–these are interpreted similarly to loadings in a factor analysis, in that a slope near 1 or -1 indicates a strong relationship between the variable and the underlying trait dimension, and a loading near 0 indicates a weak relationship.
RESULTS
Correlation between external items and concept responses
See google doc with results from external items and group correlations here:
https://docs.google.com/document/d/1uz421G2w8q_PJQNSWrB-HOETRNiQCUw-MLa6p3zfwaw/edit?usp=sharing
Using Spearman’s rho, Probability of Group 1 membership significantly correlated with more work tasks related to SLR including project management and outreach, being in a CBO or NGO org, and showing more concern for both short and long term SLR impacts. Also correlated with anticipating impacts of SLR happening sooner in time and negatively correlated with being in a water special district or local government org.
Group 2 membership significantly correlated with higher level of involvement with SLR and executive job tasks. Also correlated with enviro and water special district orgs and negatively correlated with NGO, CBO, and state government organizations. This group is correlated with higher short term awareness of SLR impacts but lower long term concern and higher assessment of risk agreement regionally.
Group 3 is significantly correlated with local government organizations but shows correlations with lower involvement levels, fewer SLR related work tasks (particularly project management and outreach), and fewer information sources used in SLR work. Negatively correlated with being in an enviro SD or NGO and correlated with lower short and long term SLR awareness, lower short and long term SLR concern, and later anticipated timing of SLR impacts. Also lower assessment of regional agreement on risks.
respondents mapped on latent variables We can map each individual survey respondent on the (D = 2) latent variables. But, because the model probabilistically clusters respondents by group (G = 3), what is actually estimated is the posterior mean for each individual on each latent variable conditional on being in group G. So, I use the group assignment probabilities to select the ‘best’ posterior means for each respondent.
Obviously, this is super messy. There is clearly variance in positioning within and between groups, but the groups themselves don’t correspond directly to positions in the latent space. To some extent, that makes sense – groups are based on responses, and latent dimensions are based on responses, but the groups and latent dimensions aren’t connected at all so the “solution” to the latent dimension model and the “solution” to the respondent grouping model can organize the data differently.
HERE IS AN EXAMPLE OF HOW WE CAN TRY TO GET AT THIS VARIANCE – IN THIS EXAMPLE, THIS PLOTS D1 AND D2 POSITION BY ORG. TYPE. I DON’T KNOW ENOUGH ABOUT THE OTHER SURVEY ITEMS, BUT KYRA YOU CAN USE THIS CODE (AND THE ABOVE PLOT) TO SEE HOW DIFFERENT ACTOR-LEVEL VARIABLES PLAY OUT
groups mapped to item-response loadings
We can then evaluate the intercept terms in each logistic response function to identify which items load heavily by group. Ignoring the model slope parameters. Here, it’s clear that some items are identified by both groups (e.g., stormwater, transportation) while others are more excluvisely related to one and not the other (e.g., concern about DACs). Some are rare in both, like “commercial” and “property value”. This also begins to reveal why increased dimensionality doens’t seem to help model fit very much, many items are about as likely to be selected by one group as the other. Heuristically, items below the dashed line are more strongly associated with Group 1, and above are more strongly associated with Group 2.
#keepConcepts = network.vertex.names(bip_net)[{bip_net %v% 'Concept_Type'} %in% c('Policy','Concern')]
coFreq_all = rbindlist(lapply(1:G,function(g) {
coFreq = data.table(reshape2::melt(crossprod(Y[g_index==g,])))
coFreq$Var1_Type = {bip_net %v% 'Concept_Type'}[match(coFreq$Var1,bip_net %v% 'vertex.names')]
coFreq$Var2_Type = {bip_net %v% 'Concept_Type'}[match(coFreq$Var2,bip_net %v% 'vertex.names')]
coFreq = coFreq[Var1_Type=='Concern'&Var2_Type=='Policy',]
coFreq$group = paste0('G',g)
coFreq}))
coFreq_all$Var1 <- as.character(coFreq_all$Var1 )
coFreq_all$Var2 <- as.character(coFreq_all$Var2 )
coFreq_all <- coFreq_all[order(Var1,Var2),][value>0,]
ggplot(coFreq_all[,sum(value),by=.(Var1,Var2,Var1_Type,Var2_Type)],
aes(y = V1, axis1 = Var1, axis2 = Var2)) +
geom_alluvium(width = 1/12,aes(alpha = V1),fill= 'grey20') +
geom_stratum(fill = "black", colour = 'grey',width = .1) +
ggtitle('Concern and policy co-occurence') +
geom_label(stat = "stratum", aes(label = after_stat(stratum)))+
scale_x_discrete(limits = c("Concern", "Policy"), expand = c(.05, .05)) +
guides(alpha = F)
THIS IS THE SAME PLOT BUT FACETED SO EACH GROUP IS SEPARATE
more results
more results
DISCUSSION
CONCLUSION
Briefly, slope parameters reflect within-group (for fixed slope = FALSE) or overall (for fixed slope = TRUE) heterogeneity for a given response variable–large slope parameters reflect larger differences in the probability of response between group members. Slope parameters also reflect interdependence between responses–two positive slopes mean that two items have a simultaneous probability of selection greater than the group median more often than would be expected if the items are locally independent (Gollini and Murphy 2014).↩︎